[HBASE-26534] Update HBase version to 2.4, make Hadoop 3 and Spark 3 default versions.#88
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
The purpose of this is to match our hbase-server version with the jar provided on EMR. While our HBase is listed as 2.2.6, the library code we have from Amazon is for 2.4.1+. There was a breaking change in 2.4.1 and this mismatch prevents bulkLoadThinRows() from working.
|
This is great. I ran into an issue with running this on AWS EMR and which provides HBase 2.4.1+ library code, and this had exactly the updates I needed. I'd love to see this merged and packaged in a release. |
|
One thing to think about with this: HBase 2.4.1 introduced (via this PR apache/hbase#2800) a breaking change for This PR enables you to use this with HBase 2.4.1 and later, but will prevent you from using it with earlier HBase versions. I know you currently allow the HBase version to be configured via the system properties, but as-is this change won't work with the HBase library prior to 2.4.1. |
|
Thanks @ianawilson for looking at this. |
ianawilson
left a comment
There was a problem hiding this comment.
@LucaCanali Makes sense to me!
I'm not a contributor to this repo, so I don't know how much weight my approval carries, but I've been using this branch successfully in a couple of ad hoc Spark jobs with HBase client 2.4, and it's been great.
|
@LucaCanali FYI, looks like there's a PR with a similar intent and slightly different approach #89 |
|
@LucaCanali do you mind merging my changes from #89 to here? I will review yours instead, and that could help to move upgrade the forward . |
taklwu
left a comment
There was a problem hiding this comment.
did you run the unit test on your local environment ? may I ask if you're using linux not mac?
basically, I had some changes in TestJavaHBaseContext.java and other *Suite.scala , please see them in my PR #89 , that helps to prevent unit test failure on my mac. but it may be only me having these issues. if you can, please help to merge those changes in unit tests as well and IMO it's helping to run it smooth without conflicts between each tests.
spark/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala
Outdated
Show resolved
Hide resolved
|
🎊 +1 overall
This message was automatically generated. |
|
Thanks @taklwu for comments and review. |
|
🎊 +1 overall
This message was automatically generated. |
taklwu
left a comment
There was a problem hiding this comment.
LGTM ! and thanks for merging the changes
|
This is great, thank you both @LucaCanali @taklwu! One question I've got: Is it possible for this update to be released and published to maven central? It seems like there are quite a few updates on this repository that haven't been released there yet. |
This proposes to bump up the default versions used by HBase-connectors to recent versions, in particular for HBase, Spark, and Hadoop:
HBase is upgraded to 2.4.9, Spark to 3.1.2, Hadoop to 3.2.0.
This proposes also to make Spark 3 and Hadoop 3 default version (as opposed to Spark 2 and Hadoop 2).
Additional minor changes to the code are proposed/needed to compile the connector with HBase 2.4.9.